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I. Introduction 

Wireless relaying systems have been recently considered an attractive option because of their potentials 
to improve the system throughput, enhance the cell-edge performance, extend cell coverage, and reduce 
the overall system deployment cost Hl-Gl- As such they have been considered for the standardization of 
IEEE 802. 16j, 16m and 3GPP LTE-Advanced i8l- lfTTl . Most of the nodes in a relaying system operate 
in a half-duplex mode, which means that they cannot transmit and receive the signal simultaneously and 
act as a transmitter or receiver at the same time. This inherent structural property of a half-duplex system 
requires two time slots in two-hop relaying networks since the source transmits the signal to relays during 
the first time slot/phase and the relays forward the received signal to the destination during the second 
time slot/phase. It results in a loss of capacity pre-log factor of | in the half-duplex protocol 

There has been a steady interest in order to overcome the inherent disadvantage of half-duplex relaying 
systems. First, the incremental relaying protocol has been proposed in [6]. The source broadcasts the 
message first and the relay is only used to retransmits the message from the source in an attempt to 
exploit spatial diversity just in case that the destination fails to decode the message from the received 
signal through a direct link between source and destination. For the non-orthogonal amplify-and-forward 
(NAF) protocol Ifl2l . JT3), the source transmits a new message to the destination during the second 
time slot. This cooperative relaying system during two time slots is equivalently modeled as a multiple- 
input multiple-output (MIMO) system which can compensate for the loss of capacity pre-log factor in a 
half-duplex mode |[T2l . The aforementioned methods are utilized in the relaying systems assuming that 
direct transmission from source to destination is available. In the absence of a direct link between source 
and destination due to a deep fade or block by obstacles, two-way relaying and two-path relaying have 
been proposed Ifl4l . In the two-way relaying protocol, the bidirectional connection between source and 
destination is established to compensate for the loss in capacity pre-log factor. 

On the other hand, the two-path relaying protocol adjusts the phase difference where the source 
alternately transmits the signals to the destination via different relays. One relay receives the signal from 
the source while the other relay forwards the message to the destination. In this protocol, the desired signal 
forwarded to the destination acts as an inter-relay interference to the relay in receiver mode. In [14], the 
destination utilizes successive decoding with successive interference cancelation and the proposed method 
gives good performance improvement only for a weak to moderate inter-relay channel. The authors in |[T5l 
proposed canceling the inter-relay self interference at one of the relays and highlighted that its method 

'in general, the capacity pre-log factor is also referred to as the degree of freedom (DOF). 
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is still robust even in a strong inter-relay channel. The previous works for two-path relaying in lfl4l . lfl"5l 
focused on a single-antenna environment and have been extended to a multiple-antenna scenario in ||T61 . 
ifTTl . The work in [16] exploits two relays with multiple antennas to cancel the inter-relay interference 
when two relays perform alternate relaying, while this method cannot recover a loss of capacity pre-log 
factor and only improve the signal-to-noise ratio (SNR) gain. Even though the proposed schemes in IfTTl 
enhance the capacity pre-log factor of the proposed schemes, inter-relay interference is not considered 
thoroughly by assuming that it is blocked by large obstacles. 

In |pT8l , we simply investigated a decode-and-forward (DF) alternate relaying system with three relays. 
The proposed scheme was shown to partially compensate for a loss of DOFs by aligning the inter-relay 
interference from different nodes and making additional spatial dimensions, which has been recently de- 
veloped for MIMO interference and X networks 1 19], [20]. In this paper, we propose an alternate relaying 
protocol with source, destination and three amplify-and-forward (AF) relays with multiple antennas in 
order to compensate for a loss of capacity pre-log factor in case of multiple-antenna scenario. Compared 
with DF relaying scheme, AF relaying scheme requires much less delay and power consumption since 
the signal processing and quantizing operation for decoding is unnecessary at the relay. More specifically, 
in this paper, inter-relay interference alignment (IA) is performed at two relays of all, while source and 
three relays should participate in the alignment operation in DF relaying. In particular, in the proposed 
scheme, the IA is embedded to align the inter-relay interference from two relays just in case the two 
relays forward the message to the destination and the other relay receives the signal from the source. 
The direct link between source and destination is not considered and it is more troublesome to recover 
the loss of pre-log factor since the direct link inherently ensures full pre-log factor even without relaying 
links. We show that the proposed method can achieve DOFs compared with 4r DOFs of conventional 
AF relaying when all nodes are equipped with M antennas. Linear filters are considered at the source, 
relay, and destination side, respectively. We then propose a class of linear filters at source and relays that 
can maximize the system achievable sum-rate for different fading scenarios. The proposed filter design 
is based on utilizing the subgradient method consecutively and alternately. We verify that the proposed 
filters are robust and give significant improvement over a naive filter and conventional relaying schemes 
though they only guarantee a local maximum of achievable sum-rate. In addition, we propose distributed 
algorithm to find the amplifying filters at the relays which do not have to mutually exchange the channel 
information in order to align the inter-relay interference signals though there is a decrease in rate due to 
a reduction in costs of the interchange of channel information. 

The remainder of this paper is organized as follows. In section JIJ we introduce the system model of an 
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alternate AF relaying protocol with three relays. Section Jn] describes the source and relay filter designs 
for different fading scenarios. We present our numerical examples in section [W] and a brief conclusion 
summarizing the main results and discussing future works of the paper are given in section [V] 

Throughout the paper, we use the following notations. Upper and lower case boldfaces are used to 
employ matrices and vectors, respectively. I m denotes an m x m identity matrix. A T , A*, A H , and 
A -1 denote the transpose, conjugate, Hermitian transpose, and the inverse of an arbitrary matrix A, 
respectively. Ab :c denotes a submatrix consisting of the ath to 6th column vectors of matrix A. Span(A) 
represents the space spanned by matrix A and Span(A) _L Span(B) means that the vector spaces of 
matrices, A and B, are orthogonal. tr{-}, E[-], and Re(-) denote the trace, expectation, and real part of 
complex scalar, vector, and matrix, respectively. 

II. System model 

In this paper, we consider a half-duplex relay network consisting of one source, one destination, and 
three AF relays which are denoted as S, D and R4 for i G {1, 2, 3}, respectively. Each node is equipped 
with M even antennas and cannot transmit and receive data simultaneously in a half-duplex mode. We 
assume that the channel between two nodes is block fading during transmission and a channel matrix from 
the jth node to the ith node for the nth time slot is defined as Hy [re] G (£ MxM for i, j G {S, D, 1, 2, 3} 
and i 7^ j. We also assume that the direct link between S and D is negligible due to a large path loss. 

In Fig. [T] we illustrate the system model of our proposed method for successive two time slots. At 
each time slot, S sends transmit signals to the relays and the other relays forward the received signals 
to D in an alternate way. This transmission protocol is consecutively repeated every two time slots as 
summarized in Table U For the even time slots, S transmits M data streams to R\ and R2, while R3 
forwards the received signal at the previous time slot to D. At the odd time slots, S sends M- data streams 
to i?3, while i?i and R 2 forward the signal received at the previous time slot to D. It is equivalently 
viewed as 2 x 3 or 3 x 2 interference Z channels and optimal in terms of achievable DOFs. 

The symbol vector, s[re], at S is generated from an independently encoded Gaussian codebook with 

s[re] ~ W(0,Im) for even n, and s[re] ~ A/"(0,Im) for odd n. The symbol vector is beamformed 

2 

by a linear precoding filter matrix, T[n] G £, MxM for even re and T[re] G C Mx f for odd re. Then, 
the transmit signal vector of the nth time slot can be written as x[re] = -y/pt[re]T[n]s[re] G C M . We 
assume that the total transmit power for each transmission at S is limited to P5, which is given by 
tr |E[x[re]x H [re]] } = tr |p t [re]T[re]T H [re]} = P5, where pt[n] is a normalization factor to satisfy total 
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power constraint. Then, the received signal at the relay side for each time slot is defined as 

yi[n] = Hjs[n]x[n] + H i3 [n]x 3 [n] + Zj[n], even n (i = 1, 2) 

2 (1) 

y 3 [n] = H35 [n] x [n] + 2J H 3i [n] x* [n] + z 3 [n] , odd n , 

i=i 

where Xj [n] G C M is the transmit signal of the ith relay amplified and forwarded to the destination and 
Zj[n] G C M is a complex white Gaussian noise vector with £A/"(0, ct^Im)- At each relay, the received 
signal is multiplied by an amplifying matrix, Fj[n] g C MxM , and the transmit signal is computed as 



Xi[n] = \/5iNFj[n]yi[n-l], (2) 

where [n] is a normalization scalar factor of the ith relay to satisfy the total power constraint. We assume 
that the total transmit power at each relay node is constrained on a certain power, Pr, which is given by 
tr {E[ Xi [n] Xl H [n]] } = tr {pi [n]F* [n] Ej [n - 1]F^ [n] } = P K , where we define S,[n] = E[ yi [n]y^[n]]. 

In £[]), the second terms of the received signal, Hj3[n]x3[n] and Yli=i H3i[^] x iM' are referred to as 
the inter-relay interference from other relays. We focus on perfectly canceling the inter-relay interference^ 
when exploiting an amplifying matrix as in (ff) before forwarding the transmit signal at the relay side. 

In order to ensure zero inter-relay interference for each time slot, it is required that 

2 

F 3 [n]Y;H 3i [n-l]F < [n-l]y f [n-2] = 1 even n 

i=i (3) 

F i [n]H i3 [n-l]F 3 [n-l]y 3 [n-2] = 0, odd n(i= 1,2). 
Under the constraints in (O, we note that the covariance matrix of the received signal at the ith relay, 
yi[n], can be rewritten as Sj[n] = p t [n]Hjs[n]T[n]T H [n]H^.[n] +<t 2 Im- The transmit signal at the relay 
side is forwarded to the destination for each time slot and the received signal at D is finally written as 

y D [n] = \/pt[n-l]H.[n,n-l]s[n-i\ +z D [n,n-l], (4) 

where H[n, n— 1] is the effective channel matrix of the n— 1th data symbol vector for the n— 1th to nth 
time slot, which is defined as 

V/P3NHD3 N F 3 [n] H35 [n - 1]T [n - 1] , even n 



H[ra,n-1] = < 



2 

J2 V^NH^FiHHisIn-llTtn-l], odd n, 



2 These interference signals consisting of previous signals degrade the performance of current received signal at D and make 
the implementation of the relay and destination side complicated in order to alleviate the effect of the interference and detect the 
desired data. In addition, since the inter-relay interference signal includes causal channel knowledge for all the previous time 
slot, it requires to use a large amount of memories at the relay and destination sides. 
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and Z£)[n, n — 1] is the Gaussian noise which is defined as 



yj p 3 [n] H m [n] F 3 [n]z 3 [n - 1] + z D [n], even n 
zo[n,n— 1] = < 2 

^ VPiM H -Di N F i M z i [n-l) + z D [re] , odd re. 
k i=l 

Z£>[n] G C M is a complex white Gaussian noise vector with CJ\f(Q, ct^Im) at .D. We consider a linear 
filter, Wb[4 at D and the estimated data symbol vector is given by s[n— 1] = W^[n]yo[n]. The mean 
square error (MSE) matrix of the n— 1th data vector at D can be computed by 

E[n,n-1] =E[(s[n-l] - s[n-l]) (s[n-l] - s[n-l]) H 

/pt[n-l]W5[n]H[n,n-l] - I[n]) (v^PtF^lJwjjfn]!!^, n-1] - I[n] 

+ W$[n]H i [n]W D [n], (5) 

where the co variance matrix of z[n, n — 1] can be calculated as 

a^p 3 [n]U D3 [n]F 3 [n]F^[n]U^ 3 [n] + crgl M , even n 

E g [n] =E[z D [n,n-l]zjj[n,n-l]j = < 2 

^ <r 4 2 pi [n] Ifoi [n] Fi [n] F 4 H [n] [n] + a^I M , odd n, 

and an identity matrix for each time slot is given by I[nl = Im for even n and I[n] = Im for odd n. 

2 

The MSE-optimal linear filter to minimize the MSE matrix is the Wiener filter [21] given in this case 
by Wo[rt] = (pt[n— l]H[n,n — l]H H [n,n — 1] + Sz[n]j y/p t [n — l]H[n, n— 1]. Plugging this minimum 
MSE (MMSE) filter into (f5]), the MSE matrix can be rewritten after some manipulations as 

-1 



E[n,n-1] = (I[n] + p t [n-l]H H [n, 1 [n]H[n,n-l] 



The achievable sum-rate of the n — 1th data vector between S and D can be written as 

I[n, n — 1] = log 2 det E _1 [n, n — 1]. 



(6) 



(7) 



In order to obtain the above sum-rate, we should find the amplifying matrix filters at the relay side which 
satisfy the constraint in ([3]). In the next section, we design a linear filter for each relay to cancel the 
inter-relay interference. In addition, a linear precoder at S and amplifying filters at Ri are developed to 
maximize the sum-rate according to different channel assumptions, i.e., slow and fast block fading. 



III. Source/Relay Linear Filter Design 

In order to find the valid amplifying matrix which can perfectly remove the inter-relay interference for 
each time slot, let us recall the constraints in ([3]). 
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For odd time slot, the following two conditions should be met to cancel the inter-relay interference at 
Ri and R2: 

Span(F i [n])±Span(H i 3[n-l]F 3 [n-l]), i = 1,2. (8) 

Since F^[n— 1] is the amplifying matrix which is used to forward the received signal at the previous odd 
n — 2th time slot and 4r symbols are transmitted to R%, we can design a linear amplifying matrix with 
rank(Fa[n — 1]) = 4p without loss of DOFs. At this time, if we design a rank--y amplifying matrix 
at i?3, we can guarantee the dimensions of inter-relay interference subspace less than 4^ ones at R\ 
and R 2 , i.e., rank (Hi 3 [n- l]F 3 [ra-l]) < 4f and rank (H 23 [n-l]F 3 [n-l]) < 4f. It means that there 
exists a subspace with the dimensions equal to or more than 4^ which is orthogonal to the inter-relay 
interference subspace for each relay. Therefore, we can design rank-4^ amplifying matrices, Fj[n] and 
F2[n], for Ri and i?2- 

Meanwhile, for even time slot, the constraint in Q can be equivalently rewritten as 

2 

Span(F 3 [n]) 1 \J Span (H 3i [n-l]F,[n-l]) . 

8=1 

F 3 [n] is required to have at least -y dimensions to forward the received signal without loss of DOFs. 
However, the inter-relay interference signal at i? 3 , Y^a=i ^-3i[ n ~ l]Fj[n— 1], has M dimensions at most 
without taking into any consideration to reduce its dimensions. We here note that each of amplifying 
matrices, Fi[n— 1] and Fa[n— 1], are designed to have 4p dimensions at the odd time slot. We consider 
an IA, where the signals can be designed to cast overlapping shadows at i? 3 , while they remain distin- 
guishable at D. Two interference signals can be perfectly aligned on the ^-dimensional subspace if we 
utilize the amplifying matrices, Fi[n— 1] and F2[n — 1], satisfying the following relation: 

Span(F 3 [n])±Span(H3i[n-l]Fi[n-l]) = Span(H3 2 [n-l]F 2 [n-l]). (9) 

There exists M- -dimensional subspace orthogonal to ^-dimensional space spanned by the aligned inter- 
relay interference signals. Therefore, the rank-4f amplifying matrix, F 3 [n], can be developed on the 
orthogonal subspace of the inter-relay interference signals. If the rank-4f amplifying matrices at the 
relay side satisfy the conditions in ([8]) and ((9]), the inter-relay interference can be perfectly canceled for 
all time slots. 

Now, we develop the linear precoder and amplifying filters to maximize the above system achievable 
sum-rate under the zero inter-relay interference condition. As observed in ([8]) and ©, we note that 
the amplifying filters for successive time slots are affected by each other. The achievable sum-rate, 
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I[n,n— 1], is also concatenated by the achievable sum-rates of the previous and next time slot since the 
design criterion of Fj[n] is related to Fj[n— 1] and F 3 -[n+l] for j / i. Ideally, to compute the optimum 
filters maximizing the achievable sum-rate, we should solve a joint sequential optimization problem with 
parameter sets, {Fj[ra]|Vn, Vi} and {T[n]|Vn}, which is given by 



max N I[n, n— 1] 

n^ijJTlnWn}^ 
vn 



(10) 



{F;[n]|Vn,' 

However, we cannot easily compute this sequential solution since we need noncausal channel knowledge 
for all time slots and the channel is varying over a coherence time as well. In addition, this kind of joint 
optimization problem requires a huge amount of memories and makes an implementation complicated. 
Therefore, we now propose suboptimal filter designs which aim at finding symbol-by-symbol linear filters 
for the source and relays. 

For convenience, when we calculate the linear filters by solving the optimization problem, we omit a 
time index of variables and make new definitions which depend on the time slot and, which are listed 
in what follows: 



T e , even n 
T„, odd n, 



p t [n\ 



T[n] = 

Fi[n] = F h Pi [n] = Pi , i = 1,2,3, 



p e , even n 
p , odd n 



p e HjsT e Tg Htt, + erf 1^, even n (i = 1, 2) 



■e "iS 

^HttH 



H[n,n-1] 



S s [n] 



E[n, n — 1] 



PoH^ToT^hh + cj/Im, odd n (i = 3) 

H o = VP3 H D3F 3 H3sT , even n 

H e = Ei=i V^HDiFiH^Te, odd n 

S!o = o-|p3H D 3F 3 F^H53 + cj^Im, even n 

= Ei=l a iPi F j Fj H + ct^Im, odd n 

E = (Im +p H^5]- 1 H ) \ evenn 
E e = (I M + Pe H e H I]- 1 H e )~ 1 , odd n. 



I[n, n — 1] = < 



I = log 2 det E Q 1 , even n 
I e = log 2 det E" 1 , odd n 
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We obviously note that all variables with the subscript e are related to the transmission over S-(R±, R2)- 
D link during two time slots from an even to an odd time slot and the rest with the subscript o are 
related to the transmission over S-R3-D link during two time slots from an odd to an even time slot. 

A. Iterative Source/Relay Filter Design For Slow Fading 

Let us first consider the filter design for slow fading channel when the channel gain is random but 
remains constant, H^-fn] = Hjj for all n. At this time, the design of the linear filters to maximize the 
achievable sum-rate in (fTOb is equivalent to jointly optimize the linear filters for two time slots since 
the optimizations for every two time slots are the same regardless of a time index. Therefore, the linear 
filters for the source and relay nodes for every even time slot and every odd time slot remain constant 
and the design of linear filters for both time slots is related to each other regardless of a time index. 
In this case, the rate-maximization problem in (fTOb can be reformulated as a joint optimization problem 
with parameters for even time slot and odd time slot, which is given by 

Fl ,F ™T e , To 2 ^ ^ E ° 1 + l0g2 ^ Ee ^ 

s.t. Span(F 3 ) J_ Span(H 31 F 1 ) = Sp a n(H 32 F 2 ) (11) 

Span(H 13 F3) 1 Span(Fi) , Span(H 23 F 3 ) 1 Span(F 2 ) . 

To solve this problem, each node requires global channel state information (CSI) which can be acquired 
at the beginning of transmission. This is not a convex optimization which is difficult to be solved by 
standard optimization tools. In order to find a suboptimal solution, we define an amplifying matrix for 
each relay as a product of two rank- 4^ matrices, which is given by 

F« = BjW 4 H , (12) 

where B; e C Mx f and W» S C Mx f are respectively referred to as a forward matrix and a backward 
matrix in this paper. Then, the IA constraint in © to cancel the inter-relay interference at _R 3 can be 
rewritten as 

Span(W 3 ) J_ Span(H 3 iBi) = Span(H 32 B 2 ). (13) 
We make the following structure: H 3 iBi = \Jb4>i an d H 32 B 2 = Uh0 2 , where U;, G C Mx ~ is a basis 

M M 

matrix which spans the aligned interference subspace and <p i G C 2 x 2 is an arbitrary matrix. Now we 
can rewrite the backward and forward matrices as 

W 3 = U^-0 3 , and B; = H^U 6 f , i = 1, 2, (14) 
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where we define Z 1 - = Im — Z(Z H Z) _1 Z H for an arbitrary Z and G C Mx ~ is an arbitrary matrix. 
Since W3 should be orthogonal to the aligned interference signals, it is projected onto the orthogonal 
subspace of U&. 

On the other hand, in order to cancel the inter-relay interference at R\ and i? 2 , the following conditions 
should be met: 

Span(H i3 B 3 ) ± Span(W;) , i = 1,2. (15) 

Both conditions in ([TBI ) are equivalently represented as Span(B3) _L Span(Hi 3 Wi) and Span(B3) _L 
Span(H^ 3 W 2 ). Since both H^Wi and H^ 3 W 2 are orthogonal to B3 and span an 4f -dimensional 
subspace, two matrices should be the matrices lying on the same subspace which is presented as 
Span(Hi 3 Wi) = Span(H2 3 W2). Similarly to the structures of the forward matrices, Bi and B2, 
Wi and W2 have the structure as H^Wi = \J w ip 1 and H2 3 W"2 = \J w xf} 2 , where U w G C Mx ~ is a 
basis matrix and G C~ x ~ is an arbitrary matrix. Using the notations, we can rewrite the forward 
and backward matrices as 

B 3 = Ui0 3 , and Wi = H^U^, i = 1, 2, (16) 
where ^ 3 G C Mx ~ is an arbitrary matrix. The amplifying matrix filters for the relays are represented as 
Fi = H^UftGiU^B^ 1 , i = 1, 2, and F 3 = U^GgU^ (17) 
where Gj = cfri^ is an arbitrary matrix. Using new definitions in (fTTT ). (fTTT) can be reformulated as 

max - (logo det E" 1 + log 9 det E7 1 ) . 

U b ,U ll , 1 G 1 ,G 2l G 3l T ei T 2 V Z Z e ' 

In order to maximize the above achievable sum-rate, we consider an iterative algorithm using the 

subgradient method which is a first-order optimization to always guarantee finding a local minimum of 

an objective functiorij. Therefore, we should first find the partial derivatives of the objective function 

with respect to U£, U*, G*, G* 2 , G 3 , T* and T*, respectively to compute the direction in which it 

3 It is very simple and easy to use though exhibiting very slow convergence in the worst case. Briefly reviewing the operation 
of this method, the derivative of the objective function is given by 9/(Z, Z*)/9Z*, where /(Z, Z*) is an objective function with 
respect to Z and Z*. The fcth iteration of the method can be formulated as Z lk+1] = Z lk] + ^ k \df(Z [k \Z^ ] *)/dZ*), where 
/i' fc ' is a step size parameter. In this paper, the step size parameter should be determined by Armijo's rule 1221 guaranteeing 
f(Z lk+1] ) < f(Z lk] ). We determine /i W = v m where m is the smallest integer such that f{Z [k] + is m df(Z lk] , Z W *)/<9Z*) < 
f(Z lk] ) + (v m \\df{Z lk] ,Z lk] *)/dZ*f F for C, v G (0, 1). We set £ = 0.2 and v = 0.5 for numerical results in the paper. 
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increases the fastest for each iteration. When we define the objective function as f\ = \[I + h), the 
partial derivatives of f\ with respect to the matrices at the relay side can be computed a£ 

dfi 



d\J* b 

dfx 
dU* w 

dfi 
dG* 

dfi 
dGf 3 



Pe 

2 In 2 



Pe 



i=l 
2 



2 In 2 



£ Hr 3 ^H H - 1U6Gi _ (* 3 U^G 3 H + G 3 U^ 3 H ) XJ+ 

i=l 



Pe 



2 In 2 

Po 

2 In 2 



U^H-^.HfU,, i = 1,2, 



(18) 

(19) 

(20) 
(21) 



where we define = Z(Z H Z) 1 for arbitrary Z, and 

n* = 

Q 3 = H» 3 Sj 1 H E (T?Hg s - |^HS l S- 1 H J ,3P 3 ), 



H/jiSg 1 H e E e ( Tg — af^/pi'H^'E e 1 HoiFj I , i — 1,2, 



V 



Pr 



-Re ( tr 



{f^} 



F,;2L 



1,2,3. 



From now, we should find the partial derivatives of f\ with respect to T* and T* at S. In ||23l Eq. 
19], the partial derivative of achievable sum-rate with respect to a precoding matrix at the source has 
been obtained for one and two-way relaying systems using multiple MIMO relays. We utilize this result 
to find the partial derivatives for both T* and T* which are given by 

dfi 



<9T* 



tr {H e H S«H e E e } T e + ^J2 \/ftH"F, H HS,£ e H«E < 



-E 

1=1 



ptPiyPi 

2P R In 2 



Re (**-^tr{T2» 4 T e } 



(22) 



dfi 
dT* 



PZP3VP3 

2P R In 2 



^~2*r | H„ S H E | T G + H 3S F 3 H £>3 S °H E 

{F 3 H fi 3 }) (* 3 " ^tr{T H * 3 T }I M ) T G , 



Re tr 



(23) 



where we define <&j = H^F^FjHj? for i = 1, 2, 3. Using the partial derivatives with respect to the 
related matrices at the source and relay side, we propose an iterative algorithm applying the subgradient 
method for each matrix sequentially. The basic idea of the method is to take a step along the direction 
of the gradient with respect to each matrix for each iteration and repeat the iteration until approaching a 



4 In Appendix lAl we describe the derivation of finding the partial derivatives in detail. 
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local maximum of the achievable sum-rate. We describe the mode of operation for the proposed iterative 
algorithm in the following. In this algorithm, m for i G {b, w, 1, 2, 3, e, o} is a step size parameter allowed 
to change at every iteration and e is a precision factor to terminate the iterative procedure. 

Iterative Algorithm I 

Initialization 

1) Initialize the matrices, uf 1 , U^ 1 , G^, G^ 1 , G^ 1 , T* 1 and if 1 for k = 0. 
Iteration 

2) Compute the partial derivative, dh (ujf 1 ) /0UJ, and update the matrix, u[ fe+1] = uf 1 + $ ] dfx (\j[ k] ) /<9U* . 

3) Compute the partial derivative, dfi (U$ ) /dV* w , and update the matrix, U^ +1] = U^ 1 +^ ] df x (U^ 1 ) /0U* . 

4) Compute the partial derivative, a/i(Gf ] )/9G*, and update each matrix, Gf +1] = g|* 1 + / uf 1 ^/i(Gf I )/dG* 
for i = 1, 2, 3. 

5) Compute the partial derivative, d/i(Tf ] )/dT*, and update each matrix, Tf +1] = Tf 1 +/xf ] 5/i(Tf ] )/ 9T * 
for i e {e, o}. 

6) If /j fc+1 ' — /] fe ' < e, stop iteration. Otherwise, fc <— fc + 1 and repeat 2)-5). 
Results 

7) Output the matrices, Tf +1] for i e {e,o} and Ff +1] for i = 1, 2, 3. 

B. Alternately Iterative Source/Relay Filter Design For Fast Fading 

1) Scenario 1: Flat Fading Per Two Time Slots: Now we consider the filter design for the block 
fading channel which is often assumed in cooperative systems or relaying systems. It is usually assumed 
in relaying systems that the channel remains constant during two hops, which the transmission stages 
from source to relay and from relay to destination are called first phase and second phase, respectively. 
Likewise, we assume that the channel matrices during two consecutive time slots over S-{R\,R2)-D 
link remain constant, which is presented as Hy[n— 1] = Hy[n] for i,j G {S, D, 1,2,3} and i ^ j for 
odd n. In this fading scenario, we develop a distributed alternate relaying system without exchanging 
channel information or using feedback information to cancel the inter-relay interference. In order not to 
utilize feedback channels to report the forward channel information to S, we only consider the design of 
the amplifying filters at the relay side. Since the transmit precoding filter at S is not dependent on the 
channel characteristics, we simply set T[n] = 1m for even n and T[n] = I^i-m for odd n. 

We note that each relay only knows its local channel information, that is, Ri has only backward, 
forward and inter-relay channel information, Hjs[n], Hoj[ re ]> an d Hjj[n] for j / i. Each relay can 
estimate backward/forward channel information by receiving training signals broadcasted by S and D, 
respectively and inter-relay channel information by eavesdropping pilot signals sent to D by R3. Although 
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each relay knows local CSI, it is necessary to cancel the inter-relay interference so that the relays forward 
the desired message to D. Without loss of generality, let us consider odd n. From (OQ) and ([T2T) . for the 
n — 1th and nth time slots, the received signals at Ri can be written as 

yi [n- 1] = ^ [n - 1] T [n - 1] s [n- 1] + H i3 [n - 1] B 3 [n- 1] [n - 1] y 3 [n - 2] + z* [n- 1] , i = 1 , 2, 

2 

y 3 [n] = H3s[n]T[n]s[n] +^H 3 i[Ti]Bi[n]Wj 1 [n]y i [n-l] + z 3 [ti]. 

i=l 

In order to cancel the interference at R\ and R2 at the nth time slot, ( fT5l ) should be satisfied, that is, 
W^[n]H i3 [n-l]B 3 [n-l] = for i = 1, 2. Meanwhile, R 3 should remove the interference signals so 
that i?3 can forward its 4^ desired messages, that is, W3 [n + l]H 3 j[n]Bi[n] = for i = 1, 2. Due to 
the reciprocity of channels between relays and constant block fading during the n— 1th to nth time slots, 
it is shown that H 3 j[n] = H^[n — 1] for i = 1, 2. Using this equality, the condition for interference 
cancelation at i? 3 can be rewritten as W3 [n+l]H^[n — l]Bj[n] = for i = 1, 2. Both conditions for 
interference cancelation at all relays can be met at once by setting 

Wi[n] = U< 

Bi[n] = W?H&[ra] (24) 
W 3 [n+1] = B*[n-l]^ 3 [n+l], 
where Uj is an orthonormal matrix which spans the space orthogonal to Hj 3 [n— l]B 3 [n— 1] and ^[-] G 
C 2 x 2 is a matrix determined by local optimization at Ri. If i? 3 decides the forward matrix, B 3 |n— 1J, 
for its desired signal, Ri and R2 can find the backward matrices, Wi[n] and W2[n], orthogonal to 
Hi 3 [n — l]B 3 [n — 1] and H2 3 [n — l]B 3 [n — 1], respectively. The forward matrices, Bi[n] and B2[n], can 
be also obtained by using Wi [n] and W 2 [n] , while i? 3 exploits B 3 [n — 1] to determine the backward 
matrix, W 3 [n+1], orthogonal to H 3 i[n]Bi[n] and H 3 2[n]B2[n]. With this setting, each relay does not 
have to know other relays' channel information to align interference signals but its own local channel 
information. Now we should first optimize the forward matrix, B 3 [n — 1] , for the n — 1th time slot to 
maximize the achievable sum-rate over S-R3-D link. Based on it, £i[n] and £ 2 M 316 locally optimized 
after determining backward filters. First of all, we note that the backward matrix for the n— 1th time slot, 
W 3 [n— 1] , is already given since it is determined by previous forward matrix. i? 3 computes the amplifying 
matrix, F 3 [n — 1] , to maximize the achievable rate based on the backward/forward channel information, 
H35 [n— 2] and H £> 3 [n— 1] , for the n— 1th time slot as well as the backward channel information, H35 [n+l] , 
for the n + lth time slot since B 3 [n— 1] is related to W 3 [n + 1] as in (l24l) . The MSE matrix for data 
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vector at D over S-R3-D link in © for the n — 1th time slot can be rewritten as 



E[n-l,n-2] = (l[n-l] + ^p t [n-2]il H [n-l,n-2}T,J 1 [n-l}il[n-l,n-2] 



where H[n-l,n-2] = ^/p 3 [n-l]HD 3 [n-l]B 3 [n-l]W^ [n-l]Hjs[n-2]T[n-2l. We emphasize that 
B3 [n — 1] also affects the performance of the data vector for the n + 1th time sloo At the n — 1th time 
slot, R3 cannot estimate the forward channel, H^fn+l] but can estimate the backward channel, H^rt]. 
We thus introduce a new objective function to measure the performance over S-R3 link when we use 
the transmit precoding matrix, T[n], and the backward matrix, Ws[n + 1]. Multiplying the backward 
matrix by the received signal at i? 3 for the odd time slot given in fl}, the postprocessing signal can be 
calculated as 

y p = W£[n+l]y 3 [n] = VftRH p s[)i] + z p , (25) 

where we define H p = W^[n+l]H3s[n]T[n] and z p = [n+l]z 3 [n] with S p = E[z p z£] = cr|W 3 H [n+ 
l]W 3 [n+l]. Adapting MMSE filter for the postprocessing received signal, its MSE matrix and achievable 
sum-rate can be evaluated as E p = (l[n] + pt[n]Hp S~ 1 H P ) 1 and I p = log 2 detE~ 1 , where H p = 
W^n + ljHssfnjTfn] = £3 [n + l]Bj[n-l]H3s[n]T[n]. We arbitrarily determine £ 3 [n + l] = 1m. since 
^ 3 [n+l] is irrelative to E p . From now, we omit the time index for convenience and use the simple notations 
listed in the previous section and, using B3, the MSE matrix for the nth time slojf] can be represented as 
E p = (Im + ^T^H^BIB^H'ggTo) . We formulate the problem to maximize the achievable sum- 
rate as max \ {I + Ip)- Now in order to utilize an iterative algorithm based on the subgradient method, 
we should compute the partial derivative of the achievable sum-rate, f± = |(J + Ip) with respect to B3 
which is given by 



^ = ^f*»W3 + B^TjBlV 



9B* 2 In 2 



(26) 



where we here note that T3 is not a function of H3S but H 3S . Given the backward matrix, W 3 , R3 
computes the forward matrix, B3, by using the method of steepest ascent as shown in the distributed 
algorithm at the end of this section. For the nth time slot, R\ and R2 compute the amplifying matrix 
orthogonal to the inter-relay interference signal from i? 3 based on the relation in d24l ). which is given by 
Fj = U^jU^ for i = 1, 2. Since Uj can computed by the received signal from the previous time slot, 

5 Bs[n— 1] is related to Wi[n] and W2[n] as in J24t but we cannot consider joint optimization of them since we assume 
that R3 only knows its local CSI. 

6 We note that we put (■)' on the channel matrix for the nth time slot to distinguish it from that for the n — 2th time slot. 
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each relay should find £ { to maximize the achievable sum-rate. In order to find the amplifying matrices for 
the relays which have only local channel information, MMSE and zeroforcing-based filter design has been 
proposed in 11241 . However, these filters cannot be applied in this scenario because it is not successful to 
cancel the interference among M data streams via 4r received signals with the postprocessing backward 
channel matrix, tJ^Hjs. Since we assume in this scenario that the CSI exchange between relays is not 
possible, we try to find each £ f to maximize the individual mutual information via i?i and R2, respectively. 
We assume that the received signal is given by ym = -Jp e piH.Di^i (HjgT e s + z j) +zd- We compute the 
amplifying matrix to maximize the mutual information of single relay channel, f e { = log 2 det , where 
E ei = (Jm + p e p i H.%F\ i n% i V- i i H Di F i n is y 1 and £ ei = afpi Hd j F jF^ + a 2 D I M . We consider an 
iterative algorithm using the method of steepest ascent] and then the partial derivatives of / e j with respect 
to £* can be computed as 

— - Pe U T fr U (21) 

In @7), we define * ei = ^ Ul ei - j^Re (tr{F^ e i}) F^V where H ei = ^BbiFiHg and 
Q ei = H^Ej/HeiEei (H^-o-f^HHE^HoiFi). Ri for i = 1, 2 can compute the amplifying 
matrix, Fj = tj^tj^ for the nth time slot as shown in the following algorithm. Finally, we find the 
distributed iterative algorithm alternately utilized for each time slot based on d24l ). 



Distributed Algorithm 



Case I: even n, 
Initialization 

\k] 

1) Given the matrices, T Q = I M v m_ and W3, initialize the matrices, B3 J for k = 0. 
Iteration 

2) Compute the partial derivative, 0/ 4 (Bj, fcl )/ 9B 3> and update the matrix, B l * +1] = B [ ^ ] +fj^ ] df A (Bf ] )/dBl. 

3) If — J4 < e, stop iteration. Otherwise, k <— fc + 1 and repeat 2). 
Results 

4) Output the amplifying matrix, F^ 4 " 1 ^, for the nth time slot and the backward matrix, W 3 = B^ 41 '*, for 
the n+ 2th time slot. 

Case II: odd n, 

Initialization for the ith relay (i = 1,2) 

1) Find the matrix, Uj, orthogonal to H^Bs. 

7 There have been several algorithms to optimize the relay filter for single MIMO relaying channel [25], |26|. We can also 
apply these methods to find the amplifying filters. 
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2) Initialize the matrix, 
Iteration 

3) Compute the partial derivative, df ei (^f 1 ) /d$* , and update the matrix, £ f + 1] = £ f 1 + 1 9/ ej (^f 1 ) /<9£* . 

4) If f^ +V[ — flf < e, stop iteration. Otherwise, k <— k + 1 and repeat 3). 
Results 

5) Output the matrices, F- = U*£- J U" for the transmission during odd time slot. 

2) Scenario 2: Flat Fading Per One Time Slot: In this section, we first consider a filter design for 
a block fading scenario in which the channel is varying every time slot, i.e., Hy[n] ^ Hy[n— 1] for 
any n. For any nth data symbols, the joint optimization of the transmit precoding filter, T[n], at S and 
amplifying filter, Fj[n+1], at the relays cannot be applied simultaneously since the forward channel for 
the next time slot cannot be estimated at present. Recalling (fT2b . the backward filter, Wa[n], is jointly 
optimized with the filters, Bi[n— 1] and E$2[n— 1], for the n— 1th messages since it cancels the inter-relay 
interference signal induced by the n— 1th message in (TOT ). On the other hand, B^ln] is jointly optimized 
with Wi[n+1] and W2[n+1] due to the constraints in ( fT5l ). At this time, it is required to know global 
CSI to perform the joint optimization. 

First, we focus on the optimization for even n to find T[n], Wifn+l], Wjfn + l] and Ba[n]. We 
note that the transmit precoding filter, T[n — 1], and the backward filter, W3[n], are given through the 
optimization for the previous odd time slot. We need to introduce a new objective function since we 
cannot estimate the channel matrix, Hpifn + l] and H^t^ + l^ for the next odd time slot due to the 
nature of block fading and then cannot use the MSE matrix, E[n+l,n]. The received signals at R\ and 
i?2 for the even time slot are given in £T|) and multiplying the backward matrices by them yields 

Wi>+l]yi[n] 



W 2 >+l]y 2 [n] 
where the compound channel matrix and noise are defined as 



\/ p t [n]H c s{n\ + z c , 



T[n] 



W 2 H [n+l]H25[n]_ 
We define the covariance matrix of noise vector z c as 

E[z cZ H] 



W^[n+l]zi[n] 
W^[n+l]z 2 [n] 







ff 2 2 W 2 H [n+l]W 2 [n+l] 
We use the MSE matrix of this compound received signal as an objective function in part for the 
optimization of linear filters for even time slot. We assume MMSE linear filter for this compound signal, 
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W c = (pt [n] H C H^ + 5} c ) _ 1 \Jpt \n] H c , although it is not practically used in the system. The MSE matrix 
for this compound signal can be computed as E c = (I[n] + pt [n]tl\}Y]~ 1 'H c )^ 1 . From now, we use the 
simple notations listed in the previous section for convenience. We define the achievable sum-rate for 
the compound signal as I c = log 2 det E" 1 , where 

/ 2 \ - 1 

Pe 



i=i a 



We formulate the problem to maximize the total achievable sum-rate for even time slot as 

max Ul + I c ) 
Wi,W 2 ,B 3 ,T c z 

s.t. Span(H,3B3) _L Span(Wj), i = 1,2, 

where T G and W3 are given from the previous slot. Recalling ([161 ), this problem is reformulated as 

max \ (log 2 det E" 1 + log 2 det E^ 1 ) , 

where tp 1 and xj> 2 WQ irrelevant to the optimization since they cannot affect the MSE matrix of the 

compound signal and therefore we simply set ip 1 = ip 2 = Im.- In the same manner as the previous 

2 

proposed algorithm, we utilize the iterative algorithm based on the subgradient method. We compute the 
partial derivative of / 2 with respect to each matrix taking a step to a local maximum for each iteration, 
where we define / 2 = \{I + h)- The partial derivatives of f Q with respect to U^, and T* are given 

9h Pe E^H^T.WI-^U^^^V^^^UL (28) 

=1 1 

Ui* 3 W 3 , (29) 



dU* w 


2 In 2 ^ 

i 


df 2 


Po 


903 


2 In 2 


dh 


Pe 


dT* e 


2 In 2 



(30) 



where we denote Tj = HjsT e E c Tg for i = 1,2. Finally, we describe the proposed iterative algorithm 
for even time slot based on successively applying the subgradient method for each optimizing matrix in 
Case I of the following algorithm at the end of section. 

On the other hand, the optimization for odd n is to find T[n], Bi[n], B 2 [n] and W3[n+1] maximizing 
the achievable sum-rate for odd time slot when the backward filters and transmit precoding filter, Wi [n] , 
W 2 [n] and T[n — 1], are given from the previous time slot. In the same way as the optimization for 

8 In Appendix [B] we describe the derivation of finding the partial derivatives in detail. 



July 17, 2012 



DRAFT 



18 



even time slot, we cannot use the MSE matrix for S-R3-D link since we cannot know the channel 
between R3 and D at present. Recalling the postprocessing recieved signal in (l25l ). adapting MMSE 
filter for the postprocessing received signal, its MSE matrix and achievable sum-rate can be evaluated as 
E p = (l[n\ + pt [n]Hp E~ 1 H P ) 1 and I p = log 2 detE~ 1 . Using simple notations without time indices, 
we can formulate the optimization problem for odd time slot such as 

max h(I e + I v ) 
Bi,B 2 ,W 3 ,T„ zy 

s.t. Span(W 3 ) 1 Span(H 3 iB 1 ) = Span(H 3 2B 2 ), 
where T e , Wi and W2 are given from the previous time slot. Recalling (fl4l) . the above problem can be 
reformulated as 

j j max ^(logadetE" 1 + log 2 det E" 1 ), 
u b ,4> 1 ,4> 2 ,T 

where E p = (j.m_ + ^-T^ H35.W3W3 H^T^ and we set tp 3 = I M x . m since I p does not depend on 
ip 3 . The partial derivatives of the achievable rate, h = \{J e + Ip), with respect to U£, (f>\, </> 2 an d T* 
can be obtained as 

§§, = H 3 -:, H * l W^H - ^-Vi (w^TaW^H + V 3 W 3 H Y^) uj, (31) 

b i—i 3 

U 6 H H^ H *iWi, i = l,2, (32) 

{~^k tr { T3W 3 W 3 H } T ° + ^2 H 3sWlw 3 H H 39 T E^ , (33) 

where we denote T 3 = H3sT 'E p T^ H35. For odd time slot, we present the iterative algorithm in Case 
II of the following algorithm to find the above matrices by using the subgradient method when T e , Wi 
and W 2 are given from the previous even slot. Finally, the proposed algorithm for block fading channel 
is to alternately utilize the iterative algorithm to find the matrices which are used at present time slot, 
while the matrices optimized at the previous time slot are fixed. 

Iterative Algorithm II 

Case I: even n, 
Initialization 

1) Given the matrices, T Q and W3, initialize the matrices, XJw\ 4>t\ 1e*' for k = 0. 
Iteration 

2) Compute the partial derivative, df 2 {V [ w)/dV* w , and update the matrix, u£ +1] = uS ] +/^ ] 9/ 2 (uS ] )/d u * 

3) Compute the partial derivative, d $2(4^) / dcj)^, and update the matrix, <$ +1 ^ — <^-\-^df2{<t^)/d(jy^. 



dh 


Pe 


d<f>! 


2 In 2 


dh 


Po 


dT* e 


2 In 2 
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4) Compute the partial derivative, df 2 (T l e ] )/dT* e , and update the matrix, T [k+ 1] = T [k] + ^ d f 2 {T [ e ] ) / dT* e . 

5) If f^^ — /I < e > sto P iteration. Otherwise, k <— k + 1 and repeat 2)-4). 
Results 

6) Output the matrices, F 3 fc+1 ', Ti fc+1 ', W^ fe+1 ' and W^ 4 " 1 ' for the transmission during even time slot. 
Case II: odd n, 

Initialization 

1) Given the matrices, T e , Wi and W2, initialize the matrices, U[, , (p^ , (p^ an d To fc ' for k = 0. 
Iteration 

2) Compute the partial derivative, df 3 (u{* ] )/9U£, and update the matrix, lj[ k+1] = U l b k] +$ 1 9/ 3 (u[f 1 ) /<9U£ . 

3) Compute the partial derivative, <9/ 3 ((/>- fc ')/<9</>*, and update each matrix, cj)^ 1 ^ = <j)^+ij^df 3 {<p^)/d(f)* 
for i = 1, 2. 

4) Compute me partial derivative, a/ 3 (TL fc] )/<9T*, and update the matrix, Ti fc+1] = T^+M^d/stT? 1 V<9 T o- 

5) If /f +1] - /] fc] < e, stop iteration. Otherwise, k <- k + 1 and repeat 2)-4). 
Results 

6) Output the matrices, , F 2 T a and W 3 for the transmission during odd time slot. 



IV. Numerical Results 

In this section, we present some selected simulation results to compare the performance of the proposed 
scheme and the other schemes by Monte carlo simulations. We consider symmetric Rayleigh fading case, 
that is, each element of forward and backward channel matrices is independent and identically distributed 
complex Gaussian random variable with zero mean and unit variance. For comparison, we here assume in 
our proposed scheme that P$ = Pr = P and af = a 2 D = a 2 for i = 1, 2, 3 and define the SNR as p = 
With our proposed protocol, we consider two different filter designs such as the iterative algorithm I in 
section HTl-AI and the naive filter, where it simply sets T e = 1m, T g = U& = U w = cf> 3 = if> 3 = I M 1 .m 

and cf) 1 = 02 = ^1 = '02 = 1^-- F° r comparison, we consider three different schemes in conventional 

2 

half-duplex mode. First, in relay cooperation scheme, all relays fully cooperate to forward the data, i.e., 
they share all information and act as one relay equipped with 3M antennas. Secondly, we consider the 
best relay selection scheme which selects only one relay maximizing the sum-rate among three relays. In 
addition, the conventional AF relaying using a single antenna is considered. In the aforementioned three 
schemes, source and relay filters to maximize the sum-rate are determined by using a unified framework 
in |[26l . For fair comparison of total power constraints at the source and relay during two time slots, it 
is assumed that P5 = 2P and Pr = 3P for the best relay selection and conventional relaying schemes 
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as well as Ps = 2P and total transmit power of 3P over all the relays for the relay cooperation scheme 
unless otherwise noted. We point out that we determine each step size, m, for our proposed iterative 
schemes based on Armijo's rule E2l and a termination parameter as e = 10~ 2 , while the other iterative 
schemes for comparison utilize e = 10~ 4 for termination. 

In Fig. [2 we present the outage probability of five different schemes with M = 4 for slow fading 
channel. The outage probability is defined as P out = Pr \_\{I + le) < lout] where lout denotes the outage 
threshold and we assume that I out = 2 [bits/s/Hz] in this paper. The relay cooperation scheme provides 
better outage probability over the other schemes since it exploits full diversity gain over M x 3M forward 
channel and 3M x M backward channel by full relay cooperation. The best relay selection scheme has the 
same diversity order as the relay cooperation scheme but the different power gain since the relays are not 
cooperated for forwarding the data and only one relay is active during the transmission. The conventional 
AF relaying and naive filter give worse outage probability than our proposed iterative algorithm I for the 
given outage threshold, while the naive filter in the proposed protocol is even worse than the conventional 
scheme. However, when the source and relay filters are embedded at the nodes by using the proposed 
iterative algorithm I, we show from this figure that it improves the power gain significantly. Although the 
proposed iterative algorithm I cannot obtain the same diversity gain as relay cooperation scheme and best 
relay selection scheme due to the diversity-multiplexing tradeoff, it gives robust performance in terms of 
outage probability compared to other schemes for the SNR range of interest. 

Now we present e-outage achievable rate of different transmission strategies with M = 2 and M = 4 
for slow fading channel in Fig. [3] The e-outage achievable rate is defined as I e = max/ mi subject 
to Pout(Iout) < £> where we set e = 0.1 in this paper. A naive filter provides worse outage sum-rate 
compared with conventional relaying schemes for low and moderate SNR region. On the other hand, 
our proposed scheme remains superiority over conventional schemes in the whole SNR range of interest. 
Hence, we know that our proposed iterative algorithm I operates suitably for slow fading environment. 

In Fig. |4l we present the achievable DOFs and the ergodic sum-rate of three different schemes, the 
proposed scheme with naive filter, best relay selection scheme, and conventional AF relaying scheme 
for flat fading channel per two time slots. Since we verify that our proposed scheme improves the 
capacity pre-log factor, we simply assume that all transmission strategies simply utilize a naive filter and 
Ps = Pr = P for power constraint at the source and relays. The capacity pre-log factor is defined as 
r) = lim where Up) is the system sum-rate at SNR p. Given a naive filter design, we can analytically 
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The achievable DOFs 



compute n e = lim = M for even n and ri = lim ^ = 4f for odd n 9 . 

p— ¥00 lo S P p— >oo 10 S P A 

in our proposed protocol is rj = ^(rj e + rj ) = ^f - For different antenna cases, M = 2 and M = 4, we 
numerically show from these figures that our proposed scheme can obtain ^ DOFs, while the existing 
schemes using conventional half-duplex protocol achieve 4r DOFs. Hence, we can see that our proposed 
scheme provides additional ^ DOFs over conventional schemes by exploiting alternate relaying and IA. 

Fig. |5] illustrates the sum-rate performance of several linear filters for flat fading per two time slots in 
section IIII-B1I For comparison, the transmit filters at the source are not considered also in the iterative 
algorithm II and we focus on the efficiency of an amplifying matrix at the relay. We can see from this 
figure that the proposed distributed algorithm can obtain 2 dB power gain over a naive filter. Although 
the distributed algorithm has a slight loss compared with the iterative algorithm II, the former can be 
performed locally at each relay which only requires its local CSI but the latter should require global CSI 
for all the relays. Therefore, we note that the proposed distributed algorithm is efficient in implementing 
the relays without a costly feedback load of CSI exchange between relays when the amplifying filter at 
the relay side is only considered. 

In Fig. [6] and Fig. |7J we present the sum-rate performance of three linear filters applied to our proposed 
protocol based on alternate relaying and IA for flat fading per one time slot in section IIII-B2I For 
comparison, we present the performance of the proposed protocol using the previous schemes, an iterative 
IA ETl for inter-relay IA and an iterative algorithm [23] for source and relay filter design, which is called 
as iterative IA in the whole figures. As shown in the case of M = 4 in Fig. [6l our proposed iterative 
algorithm II gives nearly 5 dB gain over naive filter and 3 dB gain over iterative IA for whole medium 
and high SNR region. For M = 2, we also obtain more than 3.5 and 2.5 dB gain over those schemes, 
which is shown that the power gain owing to optimizing source and relay filters is increased as a function 
of the number of antennas, M. We note that our proposed scheme computes such source and relay filters 
that they do not only align inter-relay interference into the subspace where it maximizes the sum-rate 
but also optimize the sum-rate of source-to-destination channel for even and odd time slots, respectively. 
On the other hand, since iterative IA scheme only focuses on nulling the interference between relays 
regardless of the sum-rate, it cannot compute the aligned interference subspace to maximize the sum-rate 
and loses significant gain over our proposed scheme. A naive filter cannot even obtain any power gain 

'Although there exists a slight rate loss at the initial phase due to forwarding no data to destination, it will be of negligence 
to compute the DOFs for long transmission time. For instance, a rate loss during N time slots is -^0(log 2 p) for initialization 
at odd time slot or j^O(\og 2 p) at even time slot. As N increases, a rate loss goes to zero. 
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resulting from maximizing source and relay filter for each time slot. 

Next, we illustrate in Fig. [7] the sum-rate improvement of three linear filters with respect to the 
number of antennas, M, for different SNR values. As shown in this figure, the sum-rate of the proposed 
algorithm increases with a larger slope compared with other two schemes. We note that, as the number 
of antennas per node increases, our proposed scheme provides proportionally increased power gain over 
a naive filter. On the other hand, the power gain of iterative IA scheme over a naive filter remains nearly 
constant regardless of the number of antennas per node. 

Finally, we present the convergence curves of the sum-rate for the proposed algorithms with respect 
to the number of iterations in Fig. [8] For SNR = 30 dB, three proposed schemes in Fig. |8(a)[ |8(b)[ 
and |8(c)| were performed in the specific fading scenarios mentioned in section IIII-AI IIII-B 1[ and IIII-B2I 
respectively. These results reveal that most of the proposed algorithms provide the sum-rate performance 
close to the outputs of the algorithms around 10 iterations, while the Case II of the distributed algorithm 
shows very fast convergence behavior. 

V. Conclusion 

We investigated in this paper a two-hop AF MIMO relaying network where three half-duplex relays 
help forward the message to the destination. An alternate relaying protocol and IA scheme were adopted 
to compensate for an inherent penalty of capacity pre-log factor |. The inter-relay interferences incurred 
by an alternate protocol were aligned to the reduced spatial dimensions and completely canceled at the 
relay. We aimed at optimizing source and relay filters to maximize the system achievable sum-rate and 
provided suboptimal solutions for different fading scenarios. Our proposed scheme can achieves 
DOFs, while the conventional AF relaying schemes provide 4f DOFs. From our simulation results, it 
was shown that the proposed filter designs are suitable for each fading scenario and have significant 
improvement over a naive filter, iterative IA scheme, and conventional half-duplex relaying schemes. 

The generalization of the proposed system using arbitrary number of relays is our future work. 
Intuitively, as the number of relays increases, the achievable DOFs will increases. We will investigate 
the feasible strategy to transmit the achievable DOFs to the relays properly for each time slot. 

Appendix A 

Partial Derivative of f z (Z, Z*) = lndet (I + p i H^ l 5]~ 1 H 2 ) 
First, we consider the MSE matrix which is unified as 

B z = (i + rH^S^H^ 1 , (34) 

July 17, 2012 DRAFT 



23 



where H z = ^!=i V^ H /* x i H *»> S * = EL offtH/iXiX^Hj + af,I, and ^{X^X^ 1 } = P fl 
for any pj, Pr, of, a 2 D , Sj, H/j, and H&j. When Xj is a function of Z, the differential of f z (Z, Z*) is 
computed as df z (Z, Z*) = trjE^ffE" 1 }, where we use the property, ln(det(Z)) = tr{Z _1 <iZ}, in [28]. 
The differential of E^ 1 is computed as 



i=l 
/ 



i=i 



(35) 



where 



dpi 



Pr 



2Pr 



tr{dXiViX? +X i V i dX?}, 
tr^XSiX^+XS^xH}, 



I 



dV- 1 = ^atdpiHfXiXVliy + af Pi H fi dX^H% 



i=l 



Plugging (1331 ) into tr{E 2 dE 2 x } and applying some manipulation yields 



df z (Z, Z*) = trjE^E; 1 } = p t tr + *£dXi} 



(36) 



i=l 



XiS,; 



(37) 



In (1361 ). we define 

= ««- |Me(tr{x^n 2i }) 

where to gi = H^S^ 1 H Z E 2 (H^ - af y /p?H z i 'E~ 1 Hf i Xi). The differentials in d36j, X* can be replaced 
with a function of Z, and if d/ 2 (Z,Z*) = tr{AjdZ + AjdZ*}, it can be shown that df/dZ = A 
and df z /dZ* = A\ [28]. Since we consider only partial derivative with respect to Z* to utilize the 
subgradient method, we do not need to consider the differential with respect to Z and omit the first term 
for convenience, that is, 

df z 



df z (Z*) = tr{A]dZ*} 



dZ* 



Ai. 



(38) 



Example: Partial Derivatives of f\ = ^(I Q + I e ) 

Let us consider fi = ^(lndet E" 1 + lndet E" 1 ) in section HEA] When f zl (Z,Z*) = lndetE" 1 , 
we set I = 1, p t = p , ~H.fi = H D3 , H bi = H 35 T , and X { = F 3 = U^G 3 U^. Using these setting, we 
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can compute 

df zl (Z, Z*) = Po ti {* 3 dF 3 H + * 3 H dF 3 } . (39) 

For Z = Ufe, the differential of F3 and F 3 with respect to can be computed as <iF 3 = -U^GsU^dU^U^ 
and dF% = -U^U^U^G^U^. The differential of /*i(U£) can be presented as 

df z i(U* b ) = -potrjju^GgHu^a + ^UiGs)^] 7 , 



dUt 



(40) 



In the same way, we can find the differentials with respect to XJ W and G3 which are given by 

T 



and 



df zl (U*J = - Po tr|[ui(* 3 U^GS l + G3U^S l )ut ) ' 
df zl (G* 3 ) = tr|(ui* 3 U^) T dG^|. 



(41) 



(42) 



Meanwhile, if / z2 (Z,Z*) = mdetE e \ we set I = 2, p t = p e , Hfi = Hm, Hbi = HjsT e , and 

Xj = Fj = H^UfeGjU^H^ 1 . Plugging these values, the differential can be rewritten as 

2 

df z2 (Z, Z*) = Pe ^ tr {*^F ? H + *^F,} . (43) 

i=l 

For Z = U5, we do not need to consider the second term for dFi since we focus on only the differential 
with respect to JJ* b . The differential of F^ can be computed as dF? = H^U^G^dU^H"^ and the 
differential of /^(U^) can be represented as 

o 

T 



i=l 



dfMUt) = Pe ^ tr H^H^l^G? 



dU» 



(44) 



Similarly, for Z = JJ W , discarding the first term for dF^ and using dFi = H^UbGjdU^H^ 1 , the 
differential of /^(UJj,) can be computed as 

o 

-1 T 



p, 



i=l 



(45) 



In case of Z = Gj for i = 1, 2, we consider dF^ = H^ H U w dG, H U^H3". H and the differential of 
df z2 (G*) is given by 



df z2 (G*)=p e tv{ 



dG 



-}• 



(46) 
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Since dh(Z*) = ^(df z (Z*) + df z2 (Z*)), we can obtain 



dfi(U* b ) 




+ df z2 


(u 6 *)) 


dfi(V* w ) 


= 21n2 (d/ * l(U - 


) + df z2 


(u;)) 


dfi(G*) 


= 2 t 2 dum, 


i = l, 


2 


dfi(G* 3 ) 


= 21n2^ G 3)- 







Finally, using the relation in d38l ), we can find the partial derivatives of f\ with respect to U^, G*, 
XJ* 2 , and G* 3 given in OH), (EUJ), and (ED- 



Appendix B 

Partial derivatives of g z (Z, Z*) = lndet (i + YlLi ^ YHH w x ! x " H &i Y ) 

Now we consider the MSE matrix, E z = (l + £^ =1 ffYHH^Xtx^H^Y) , where X* or Y is a 

function of Z and pjtr{YY H } = P$. First of all, we find the differential of g z (Z,Z*) when X^ is a 

function of Z. The differential of g z {Z,Z*) is computed as dg z (Z,Z*) = trjE^dE^ 1 }, where 

i 

dE- 1 = 4 YHH bi (xtdX^X/- + x t ±rfx i x l H ) H &* Y (4V) 

After some manipulation, we can obtain 

I 

dg z (Z, Z*) = 4 tr {x^T^XjdX^ + xfT^X^dXi} , (48) 

i=l a i 

where we define T zi = H W YE 2 Y H H^. 

Secondly, when Z is a function of Y, the differential of Ej 1 is computed as 

dE, 1 = V %Y H HHxJX 4 H H bi Y + 4dY H H^XjX l H H 6i Y + ^"H&XjX^HwdY, (49) 
~l a i °i a i 

where dp t = -||tr{dYY H + YdY H }. Plugging it into trjE^dEj 1 }, the differential of g z (Z,Z*) is 

computed as 

dg z (Z,Z*) = ^tr(4HKxJxTH w YE z cZY H -^-tr{T^X}xi 1 }YdY H } 

+ J> {^E z Y H H^xJx[ , H 6i ^ - -|^tr{T^xJx, H }Y H dY] . (50) 

i=l a i S ) 

As introduced in (|38T ) of Appendix |A] we can find the partial derivative with respect to Z*. 
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Example: Partial Derivatives of fa = \{I -\- I c ) 

When we consider ji = (In det E" 1 + lndet E" 1 ) in section ITO-Bl When we define g z i(Z,Z*) = 
lndetE" 1 , f 2 can be rewritten as / 2 (Z,Z*) = ^ (/*l( z > z *) + g zl (Z,Z*)), where we set I = 2, 
p t = p e , Hfej = Hjg, Xj = W,;, and Y = T e for g z i(Z, Z*). The differential of / 2 is computed as 

df 2 (Z, Z*) = ^ (df zl (Z, Z*) + ^ X (Z, Z*)) . (51) 

dfzi(Z, Z*) has been considered in Appendix lAl and we focus on dg 2 i(Z,Z*). First, using (l48l) . the 
differential of <? z i(Z, Z*) can be represented as 

2 

djfci(Z, Z*) = £ ^tr {w^TiWjdWj 1 + wf^X^dWi} . (52) 
i=l °~* 

For Z = U w , where Wj = U TO , we do not need to consider the second term for dWj since we 
should find the differential with respect to UJ,. Substituting <fW^ = dU^H^ yields 

2 

dg z i(U* w ) = -2^ {H^W^TiWjdU^} . (53) 

i=l ^ 

Secondly, using d50l ) for Z = T e , d^i(T*) can be computed as 

2 2 > 

d<to(T:) = J] tr ^H^wJw^H, s T e E c dT e H - -l^tr^wJW^T^ . (54) 

i=l ^ a « 5 - 1 

Finally, the differential of / 2 with respect to and T* can be computed as 

#2(U;) = J—(df zl (U*J + dg zl (\J*J) 
df2(K) = ^L_dg zl (T* e ). 
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TABLE I 
Protocol 



Transmission Type / Time Slot 


even time slot 


odd time slot 


Source Transmission 


S — > Ri , i?2 


S—>R 3 


Relaying 


-R3 — > D 


Ri,R 2 — > D 




Fig. 1. The proposed dual-hop half-duplex protocol. 



July 17, 2012 



DRAFT 



30 




-6 -4 -2 2 4 6 

SNR (p) [dB] 



Fig. 2. Comparison of outage probability among five different schemes for slow fading channel under M = 4 and I out = 2 
[bits/s/Hz]. 
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Fig. 3. Comparison of e-outage sum-rate among four different schemes for slow fading channel under M = 2 (dashed line) 
and 4 (solid line). 
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(a) M = 2 




5 10 15 20 25 30 35 

SNR (p) [dB] 

(b) M = 4 

Fig. 4. Comparison of the sum-rate and capacity pre-log factor among three different schemes for block fading channel per 
two time slots. 
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SNR (p) [dB] 

Fig. 5. Comparison of ergodic sum-rate among three different linear filters based on proposed protocol for block fading channel 
per two time slots. 
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Fig. 6. Comparison of ergodic sum-rate among three different linear filters based on proposed protocol for block fading channel 
per one time slot in the case of M = 2 (dashed line) and 4 (solid line). 
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Fig. 7. Comparison of ergodic sum-rate among three different linear filters based on proposed protocol for block fading channel 
per one time slot in the case of SNR = 20 dB (dashed line) and SNR = 30 dB (solid line). 
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Convergence behavior of three different proposed algorithms for SNR = 30 dB. 
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